Reward Maximization Assessed Using a Sequential Patch Depletion Task in a Large Sample of Heterogeneous Stock Rats

Choice behavior requires animals to evaluate both short- and long-term advantages and disadvantages of all potential alternatives. Impulsive choice is traditionally measured in laboratory tasks by utilizing delay discounting (DD), a paradigm that offers a choice between a smaller immediate reward, or a larger more delayed reward. This study tested a large sample of Heterogeneous Stock (HS) male (n = 896) and female (n = 898) rats, part of a larger genetic study, to investigate whether measures of reward maximization overlapped with traditional models of delay discounting via the patch depletion model using a Sequential Patch Depletion procedure. In this task, rats were offered a concurrent choice between two water “patches” and could elect to “stay” in the current patch or “leave” for an alternative patch. Staying in the current patch resulted in decreasing subsequent reward magnitudes, whereas the choice to leave a patch was followed by a delay and a resetting to the maximum reward magnitude. Based on the delay in a given session, different visit durations were necessary to obtain the maximum number of rewards. Visit duration may be analogous to an indifference point in traditional DD tasks. While differences in traditional DD measures (e.g., delay gradient) have been detected between males and females, these effects were small and inconsistent. However, when examining measures of reward maximization, females made fewer patch changes at all delays and spent more time in the patch before leaving for the alternative patch compared to males. This pattern of choice resulted in males having a higher rate of reinforcement than females. Consistent with this, there was some evidence that females deviated from the optimal more, leading to less reward. Measures of reward maximization were only weakly associated with traditional DD measures and may represent distinctive underlying processes. Taken together, females performance differed from males with regard to reward maximization that were not observed utilizing traditional measures of DD, suggesting that the patch depletion model was more sensitive to modest sex differences when compared to traditional DD measures in a large sample of HS rats.


Introduction
All animals face a simple, yet essential, choice between staying or leaving a resource in search of potentially better alternatives. Models of foraging behavior from behavioral ecology seek to understand choice behavior by assuming individuals maximize their net rate of energy gain when behaving optimally [1][2][3] . Patch leaving models speci cally examine decisions about how much time a forager will devote to a dwindling resource (depleting patch) before departing to locate the next such resource. The Marginal Value Theorem (MVT) 3 predicts that animals behaving optimally will leave when the rate of reward in the current patch equals the average rate of reward in the entire habitat. MVT predictions are based primarily on travel time between patches, which delays the availability of food in these alternative patches, but there are potentially other negative consequences associated with leaving the depleting patch that impact choice and postpone leaving. For example, animals risk predation, and the effort costs of traveling to an alternative patch may not vary linearly with travel time 4 . Foraging theorists have also suggested that food in alternative patches may be discounted because of the uncertainty about its availability and accessibility 5,6 . These considerations would suggest that foragers may stay longer than predicted by MVT in the depleting patch. Indeed, in many studies, animals exhibit this "overharvesting" behavior, operationally de ned as a preference for smaller, more immediate rewards in a current patch relative to the potentially larger rewards available after a delay of traveling to a new patch.
In laboratory studies of behavioral economics, impulsive choice has also been studied using Delay Discounting (DD) tasks, in which individuals choose between smaller, sooner and larger, later rewards. A discounting function can be constructed to describe how choices change as a consequence of variations in the duration of the delay to the larger reward. The gradient of the function or area-under-the-curve function provides a metric of the extent to which delays affect choices. Performance on traditional DD tasks is both genetically 15 and behaviorally associated with various aspects of the drug abuse continuum in humans [16][17][18][19][20][21][22][23] , as well as numerous other psychopathologies including attention de cit hyperactivity disorder 24 . Indeed, several researchers have suggested that the excessive devaluation of delayed rewards, as assessed by DD tasks, is a transdiagnostic feature of psychiatric disorders 25 .
Choice behavior in both patch foraging and DD tasks may tap similar reward valuation processes that drive individuals to maximize reward while also devaluating rewards that are not currently present (delayed rewards in alternative food patches). However, few studies have simultaneously measured both reward maximization and DD processes in a single task. Accordingly, we examined choice behavior in a large sample of males and females, using the sequential patch depletion procedure 26 , to evaluate the extent that behavior on the sequential patch depletion task maximized the net rate of reinforcer gain, as derived from the MVT. In addition, we also investigated the degree to which behavior could be described using traditional DD metrics (gradient of the DD function, area under the discounting curve). Male and female outbred Heterogeneous Stock (HS) rats, known for their high genotypic and phenotypic variability 27,28 , chose between a smaller, sooner reward by staying in a rapidly depleting patch and a delayed but larger reward by investing the time needed to change to a non-depleted patch. The large sample size provides the power to identify subtle effects and to the best of our knowledge, will provide an in-depth analysis of behavior in this the rst investigation of performance in HS rats in a study of patch leaving.

Delay Discounting
Performance in the Sequential Patch Procedure was rst analyzed using a traditional DD approach, which focused on the rejection volumes at the different change over delay (COD) travel times. A two-way between-subject ANOVA with sex as the between-subject factor and delay (0, 6, 12, 18, and 24 s) as the within-subject factor indicated a signi cant main effect of delay [F(4,7168) = 4,823.325, p < .001] and a signi cant interaction between delay and sex [F(4,7168) = 11.102, p < .0011; effect sizes: delay, = 0.729; delay x sex, = 0.006] on rejection volume/indifference point. Post-hoc pairwise comparison LSD tests showed that the indifference points were signi cantly different for each delay, such that indifference points decreased as delay increased. Independent samples t tests with Bonferroni corrections revealed that males had signi cantly higher rejection volumes than females at 6-and 12-s delays [6 s, t(1792) = 2.632, p < 0.01; 12 s, t(1792) = 3.287, p < .001) (Fig. 3a). These data suggest that males are discounting less than females when the travel costs (CODs) were 6-and 12-s delays, however the effect sizes were small and differences did not occur at each delay. The lack of difference at 0 s indicates that males and females do not differ on the bias parameter of the hyperbolic equation (b; Fig. 3b).
The small sex difference was underscored when the hyperbolic discounting function was t to the data. The differences at the two delays were insu ciently large to be re ected in sex differences in the hyperbolic discounting function gradient (k; Fig. 3c) or AUC (Fig. 3d).

Patch Utilization
The small differences between males and females in patch leaving at 6-and 12-s CODs were accompanied by more widespread differences in resource acquisition. A two-way between-subject ANOVA with sex as the between-subject factor and delay (0, 6, 12, 18, and 24 s) as the within-subject factor was conducted to determine if sex moderated the rate of water reinforcement (µL/min  signi cantly different for each delay, indicating that rats increased the amount of time spent in the patch as the delay increased. Independent samples t tests with Bonferroni corrections revealed that females spent signi cantly more time in the patch than males before leaving for a new patch at 6-and 12-s delays [6 s, t(1792) = − 4.312, p < .001; 12 s, t(1792) = − 5.375, p < .001] (Fig. 4c). Taken together, these data indicate sexually dimorphic performance on the characteristics of patch visits on the sequential patch depletion task, with males having a higher rate of water reinforcement, while females made fewer patch changes and remained in a patch for longer periods of time resulting in signi cantly lower rejection volumes/indifference points at some delays.
To assess whether these patch utilization patterns generally re ected a reward maximization strategy (according to the MVT), we assessed differences in deviations from the optimal time in patch; a two-way between-subject ANOVA identi ed a signi cant interaction between delay and sex [F(4,7168) = 4.83, p < .001] and signi cant main effects of delay [F(4,7168) = 3478.69, p < .001] and sex [F(1,7192) = 13.293, p < .001]. As can be seen in Fig. 5a, rats spent less time in the patch than predicted by our reward maximization calculation when the CODs were short, but this deviation declined systematically until the time in patch approximated optimal predictions (COD of 12 s) and then exceeded predictions toward underharvesting (leaving the patch sooner than optimal). Post-hoc independent samples t tests with Bonferroni corrections revealed that the deviation from the optimal time in patch was more extreme for females than males at delays of 0-18 s [0 s, t(1792) = 2.485, p < .01; 6 s, t(1792) = 4.312, p < .001 6 s, t(1792) = 4.312, p < .001; 12 s, t(1792) = 5.375, p < .001; 18 s, t(1792) = 2.74, p = .015] (Fig. 5a). These data are consistent with females having a lower rate of reinforcement than males.
To further evaluate reward maximization, the observed rejection volume was compared to the optimal rejection volume (predicted by the MVT); a two-way between-subject ANOVA identi ed a signi cant interaction between delay and sex [F(4,7168) = 10.109, p < .001] and a signi cant a main effect of delay [F(4,7168) = 1428.742, p < .001]. Females deviated from the optimal rejection volume signi cantly more than males at delays of 6 and 12 s [6 s, t(1792) = − 2.672, p < .001; 12 s, t(1792) = − 3.287, p = .005], whereas males deviated signi cantly more than females at 24 s [t(1792) = 2.032, p < .01] (Fig. 5b). Relative to MVT predictions, both sexes were exhibiting overharvesting (leaving the patch after it would be considered optimal) at 0-and 6-s delays, but at longer delays (12-24 s), there was a shift toward optimal. This altered patch leaving as a function of COD is clearly illustrated by changes in the frequency distributions of rejection volumes across delays; the percentages of animals overharvesting decreased as delay increased (Fig. 6). These data indicate that rats' behavioral choices deviate from the values predicted by the MVT. Table 1 shows the correlations amongst traditional DD composite variables (k and AUC), patch utilization composite variables (patch changes, time in patch, and rate of water reinforcement), and reward maximization composite variables (time and volume deviations) in male and female rats. For both sexes, there were strong signi cant negative associations between k and AUC values (p values < .001; Table 1). DD composite variables signi cantly correlated with the composite scores for patch changes for males and females (p values < 0.05). However, while signi cant associations were observed between patch utilization composite variables (time in patch, patch changes) and DD composite variables (k and AUC; all p values < 0.05), these associations appeared weaker than the associations observed between patch utilization variables and reward maximization variables (time and volume deviation; all p values < 0.05). Weak to no associations were observed for both DD and reward maximization and rate of water reinforcement (r = 0.3 < 0.03, see Table 1).

Strengths Of Associations Between Patch Utilization, Reward Maximization, And Dd
To compare the strength of the associations between DD variables and reward maximization for each of the patch utilization variables, we applied the method described by Meng et al. 29 , which compares sets of non-independent overlapping correlation coe cients via a z-test procedure. For most comparisons, there were differences in the strength of associations (see Table 2 for z-scores). Correlations between patch utilization composite scores and reward maximization were signi cantly stronger than correlations between patch utilization and DD variables in both males and females (Tables 1 and 2). For example, correlation coe cients between number of patch changes and k (males: r = − 0.30; females: r = − 0.35) were signi cantly weaker than correlation coe cients observed between number of patch changes and time deviation (males: r = 0.87; females: r = 0.88; see Table 2 for z-scores). Similarly, correlation coe cients between number of patch changes and AUC (males: r = 0.45; females: r = 0.47) were signi cantly weaker than correlation coe cients observed between number of patch changes and time deviation (males: r = 0.87; females: r = 0.88; see Table 2 for z-scores). This pattern in the strength of associations was repeated for time in patch (Patch utilization correlation coe cients: Reward maximization > DD, as shown in Table 1). A different pattern emerged for rate of water reinforcement. Here, both DD and reward maximization variables were either weakly or not signi cantly correlated to rate of water reinforcement. Time deviation correlation coe cients were similar to those observed for k, and AUC correlation coe cients were similar to correlation coe cients for volume deviation. These results were con rmed using correlation comparison approaches in the R package 30,31 . Taken together, these data indicate that while standard DD variables are associated with patch utilization, these associations are weak relative to associations between patch utilization and reward maximization in both males and females.

Discussion
The Marginal Value Theorem (MVT) suggests that optimal foragers will choose to stay in a patch until the rate of reward falls below the average rate of reward in alternative patches 3 . Consequently, longer travel delays between patches make staying in a depleting patch a more optimal choice because the longer delay depresses the average rate of reward, resulting in foragers remaining in the patch for a longer period of time. Alternatively, shorter travel delays between patches makes staying in a depleting patch for a long period of time a less optimal choice, resulting in "leave" choices occurring sooner. This is supported by the current experiment, in which CODs (representing travel delays) altered choice behavior. Increasing delays resulted in a decrease in the number of patch changes and an increase in time in patch as rats made more "stay" choices. The sequential patch depletion task is a valuable paradigm to understand important processes underlying optimal behavior because it has strong ethological validity. However, another important strength of this paradigm is the ability to examine reward maximization in greater depth. By assessing the percent deviation from optimality, which normalizes the data to map reward maximization for various delays, we found that animals tended to overharvest at short delays, as indicated by remaining in a patch longer than optimal. Conversely, rats underharvested when the delays were longer, as indicated by animals leaving the patch sooner than optimal. Notably, the greatest deviation from the optimal time in patch occurred when delays were < 6 s, that is 0 s. These data may imply that despite the absence of a travel cost, some other unidenti ed variable in uences the rat's choice to stay in the current patch. While the focus of MVT is about travel costs, other factors can also contribute to choices to stay in a patch, such as predation and energy expenditure 1,2,4 . In the current study, effort needed to change patches may have been weighted more at shorter delays. It is unclear if these variables are contributing to the sex differences found in this study. Future research is needed to tease apart how these factors may contribute to overharvesting.
Our data demonstrate that foraging behavior and traditional measures of impulsive choice, such as those provided by traditional DD indices, may share some characteristics of behavioral responding but may more strongly re ect distinct behavioral processes. We observed signi cant associations between DD and reward maximization variables. However, of note, reward maximization variables were more strongly associated with patch utilization variables than the DD variables. Consistent with this, Hayden, Pearson, & Platt 9 showed that the performance of monkeys in a foraging task t better with the MVT model than with hyperbolic DD function; they obtained a similar result when testing a patch-leaving foraging task interleaved with a traditional DD task 32 .

Sex Differences In Dd, Patch Utilization And Reward Maximization
Greater DD, and by extension impulsive choice, is indicated by longer stays in a depleting patch with smaller rewards. Although females had signi cantly lower indifference points than males at some delays, the effects sizes were small, and no differences in hyperbolic k values or AUC were found. Relatively few animal studies have examined sex differences in DD, however these studies have produced con icting results. In animals, larger DD has been found in female rats 33,34 and mice 35 , whereas others have identi ed larger DD in male rats 36,37 , and either age-dependent sex differences 38 , or no sex differences [39][40][41][42] have been described. Similar discrepancies have been reported in some human studies, with greater discounting observed in women 15,23,43,44 , whereas other studies have found greater discounting in men [45][46][47] . Even more studies of humans have found no differences 48-55 . These disparate ndings call for a more comprehensive examination of these differences in impulsive choice, as a foundation on which to explore the interrelationships between gender and DD in humans as well as their roles in psychopathologies (e.g., substance abuse).
In the present study, females had a lower rate of reinforcement than males, corresponding to fewer patch changes and spending more time in patches than males. Furthermore, while both sexes deviated from optimal performance, females deviated from optimality more than males at several delays. These data are consistent with ndings of sex differences in humans in the Iowa Gambling Task, in which men had a greater preference for cards that were advantageous in the long-term compared to women 56 . From this, van den Bos 56 posited that males tend to focus more on long-term goals, shifting from exploration to exploitation, whereas females exhibited greater exploratory behavior 35,57 . The ndings from the current study offer nuances to this hypothesis. The male rats in our study had a greater rate of reinforcement than females, supporting the interpretation that the males focused on long-term goals. However, females did not exhibit greater exploratory behavior as they made fewer patch changes and stayed in the patch signi cantly longer than males. Instead, the greater rate of switching between patches observed in males may re ect greater behavioral exibility, whereas females may be demonstrating more perseverative behavior 58 . Reports of sex differences in perseverative behaviors have been inconsistent [59][60][61][62] , and thus, further research is needed to reconcile these discrepancies. This is of particular importance because perseveration is a feature of psychological disorders (e.g., schizophrenia, autism, OCD and drug addiction) 63 , and understanding these processes may have implications in sex-dependent vulnerability.
There are a variety of possible explanations for why females stayed longer in a patch. For example, there are sex differences in energy expenditure 64,65 , that may result in a female bias to preserve energy. Greater "overharvesting" observed in female rats may be an inaccurate interpretation: females may have exhibited appropriate levels of harvesting given environmental or biological factors, such as those related to reproductive success which made staying in the patch a more advantageous choice. Furthermore, their strategy may be to fully deplete the patch and exploit all resources before moving on to the next patch, as overharvesting may not abrogate visiting the patch in the future when resources have been replenished. In addition, females may have a differential sensitivity to tracking a changing environment, or to cues of change 66 . Indeed, Tropp & Markus 67 found males and females utilize cues differently in various environments. When animals are presented with diminishing rewards upon the choice to stay in a patch, not only are they being reinforced, but they are also gaining new information about the quality of the reward, an important component of the economics of choice behavior 1,2 . Finally, females may tend to choose safer options, which is supported by studies in rats 68 .
Taken together, these data suggest that patch depletion model and by extension reward maximization was more sensitive to modest sex differences, compared to traditional DD tasks and metrics.

Future Directions
While we report differences between males and females in reward maximization, it is important to note we did not track estrous cycles in the females, so we cannot determine if this affected their performance on this task. To date, there is little evidence regarding the hormonal role in mediating these behavioral processes. Here, we present statistically different, but admittedly modest sex differences in performance on the sequential patch depletion task. This may be, in part, due to the role of the estrous cycle in performance on these tasks. In keeping with the literature exploring sex differences on DD, there is limited and con icting evidence for the role of the estrous cycle on DD 33,69,70 . Future research is needed to determine the impact of the estrous cycle on performance in the patch depletion procedure. The sequential patch model assumes that foragers have perfect knowledge of the model's parameters, identi ed by Stephens 2 as the "complete information assumption." While the rats in our study had substantial training, this assumption may in uence how we interpret their behavior. The differences between rewards associated with optimal rejection volumes were small, which may have contributed to the deviations from optimality. This issue has been noted previously in analyses of optimal performance on progressive schedules with reset 71-73 . This is the rst report of sex differences in a large sample of Heterogenous Stock rats tested using the sequential patch depletion procedure. By using a large sample of Heterogenous Stock rats, we showed that females performance differed from males with regard to reward maximization that were not revealed utilizing traditional measures in DD. Notably, frequency distributions of rejection volume indicate that there was also sizeable variability in rejection volumes within each sex, which enables us to explore individual differences or genetic/environmental variables that contribute to performance in this task in future studies. Furthermore, measures of reward maximization were only weakly associated with DD variables and thus may be mediated by different underlying processes. Taken together, these data show the utility in the use of the sequential patch depletion procedure for measuring choice and may have implications in vulnerability in the development of psychiatric diseases. . Experiments were conducted in batches of approximately 100 rats (4-to 5-weeks of age) at 3-to 4-month intervals. Rats were quarantined for 1-to 2-weeks upon arrival to University at Buffalo before being transferred to colony housing. The colony room was maintained at a constant temperature (22 ± 1°C), humidity range (~ 55% ± 5%), and lights were on a reverse cycle (lights on from 19:00 to 07:00).

Subjects
Rats were housed in same-sex pairs in plastic cages (42 × 22 × 19 cm) lined with bedding (Aspen Shavings). Prior to the start of the experiment, rats (n = 1590) were rst tested on four behavioral tasks (social reinforcement, locomotor response to novelty, light reinforcement, and choice reaction time; data are not reported here). A subset of animals (n = 204) were not tested on the social reinforcement test because it was introduced after the rst two batches had already been tested. At the onset of data collection, the mean (± SEM) age of the rats was postnatal day 136.58 ± 0.29.
Behavioral testing was conducted 6 days/week (Monday through Saturday) during the dark phase of the light-dark cycle between the hours of 08:30 and 12:30. Food (Teklad Laboratory Diet #8604) was available ad libitum in the home cages. Access to water was restricted to 30 min immediately following testing on Monday through Friday. At the end of the testing on Saturday, animals were given free access to water until approximately 12:30 h on Sunday (approximately 20 hr prior to testing on Monday).
This study was conducted in accordance with protocols approved by the Institutional Animal Care and Use Committee at the University at Buffalo, and animals were treated in compliance with the Guide for the Care and Use of Laboratory Animals, and the study is reported in accordance with the ARRIVE guidelines 75 .

Apparatus
Testing occurred in 24 locally constructed operant chambers (24 × 22 × 20 cm; Fig. 1) housed in soundattenuating cabinets (Model # 3000000187, Coleman, Wichita, KS), which were previously described in detail 76 . Brie y, the chambers had stainless-steel rod oors, aluminum back and side walls, and a Plexiglas front wall and top. Each test chamber had three snout poke receptacles (4 cm in diameter) located in the back and side walls. Infrared photobeam detectors, located 1 cm from the snout poke receptacle entrance, were used to record snout pokes. Three stimulus lights were located above each snout poke receptacle, and a fourth light was located in the ceiling of the test chamber. A Sonalert tone generator (SC628EJR; Mallory Sonalert Products, Indianapolis, IN) mounted on the right wall provided a pulsed 1.9-kHz tone. Acrylic dishes were located inside the left and rear snout poke receptacles and were connected to Tygon tubing, which delivered precise amounts of water from 60 ml syringes mounted on two single-speed syringe pumps (3.33 rpm, PHM-100; Med Associates, St. Albans, VT) external to the sound-attenuating cabinet.
The apparatus was controlled by MED-PC IV software (RRID:SCR_012156) with 1 ms temporal resolution running on computers with Microsoft Windows operating systems. Equipment was tested before test sessions and following any session in which a rat earned fewer than 30 reinforcers.

Sequential Patch Depletion Procedure
Behavior was measured using a sequential patch depletion procedure that was previously described (Fig. 2a) 26,77 . Brie y, during this task, water-restricted rats were offered a concurrent choice between two water "patches" (left-or back-wall snout poke receptacles) and could elect to "stay" in the current patch or "leave" for an alternative patch at any time.
A snout poke in the receptacle (i.e., entering the patch) resulted in immediate delivery of 150 µl water. If rats made a choice to "stay" in this patch, successively smaller amounts of water were delivered. To simulate patch depletion, the volume of water for successive "stay" choices was reduced 20% for each reinforcer presentation. For example, the rst reward volume was 150 µl of water, the second reward was 120 µl, and the third reward was 96 µl, etc. with each reward delivery separated by a minimum of 4 s ( Fig. 2a and 2b). Once in a patch, water was available according to a modi ed Fixed Interval (FI) 4-s schedule; each water delivery was followed by a 4-s interval during which the reinforcer was unavailable. However, successive "stay" choices were achieved in one of two ways: rats could emit a snout poke to the same receptacle as the previous response following the 4-s interval (traditional FI contingency) or the rat could remain with its snout in the receptacle for the duration of the 4-s interval (analogous to a Fixed Time or FT schedule).
If a rat made the choice to "leave" a patch by poking its snout in the alternative receptacle ("alternative patch"), a changeover delay (COD) was imposed to simulate "travel" cost 78 . During the COD, reinforcers were not available at either location regardless of responding. When the rat left a patch to travel to the alternative patch, the abandoned patch was replenished (the volume of the water was reset to 150 µl for the rst reinforced response on returning to that patch).
When a patch change occurred with a 0-s COD, the stimulus light above the abandoned patch was extinguished, and the stimulus light above the newly poked receptacle was illuminated simultaneously with the delivery of 150 µl of water. For CODs > 0 s (6, 12, 18, or 24 s), the stimulus light above the abandoned receptacle was extinguished and a 1.9-kHz tone was pulsed for the duration of the COD. At the end of the delay, the pulsed tone was turned off and the stimulus light above the newly poked receptacle was illuminated. The rst snout poke into the new location after the onset of the stimulus light resulted in the delivery of 150 µl of water. Sessions lasted for 10 min or until the rat earned a cumulative total of 5 ml of water, whichever occurred rst.
The COD was constant within a session but varied between sessions (i.e., days of the week) in the following sequence (Monday through Saturday): 0, 0, 6, 12, 18, and 24 s, with the rst session of the week being excluded from data analysis (i.e., the rst 0 s session). This 6-day cycle was repeated four times for a total of 24 test sessions. Data from the last two cycles of sessions for each COD delay were averaged for data analysis.

Dependent Measures
There were ve primary dependent measures: rate of water reinforcement, number of patch changes, average time in a patch, average water volume rejected by leaving the patch ("rejection volume"/"indifference point"), and the deviation from optimality. The rst and nal patches, including associated patch changes, of each session were not included in these calculations. These patches were excluded because, in the case of the rst patch, rats had not yet experienced the COD, and the last patch resulted in session termination due to the session ending and so not accurately representing rats' choice behavior. Rate of reinforcement was calculated by summing the volume of water earned and dividing that by time required to earn those reinforcers. Number of patch changes was the number of times rats chose to switch to the alternative receptacle ("leave" choices) averaged over the last two sessions. Time in patch was the mean duration the rat stayed at one receptacle before leaving for the other. The rejection volume/indifference point was de ned as the mean amount of water (µL) available at the abandoned patch when the rat switched to the alternative snout poke location, e.g., if the rat had earned 120 µl before leaving, the leaving volume would be the next volume scheduled for delivery, 96 µl in this example. The optimal rejection volume, based on the MVT, was operationally de ned as the volume of water that maximized the average reward volume rate across all patches including the COD travel time. This was calculated as the cumulative rate of return from the patch (µL/s) for each reward, considering the COD length (travel time to the patch) and the 4 s between successive rewards (Fig. 2c). For example, when the COD was 6 s, the cumulative rate of return for the rst reward in a patch was 150 µL divided by 6 s (25 µL/s). For the second reward, it was [150 + 120] µL divided by [6 + 4] s (27 µL/s). For the third reward, it was [150 + 120 + 96] µL divided by [6 + 4 + 4] s (26.14 µL/s). In this instance, an optimal strategy predicts leaving after two rewards are obtained and the rejection volume is 120 µL. The percent deviation from the optimal rejection volume (referred to as percent volume deviation) was calculated as follows: (optimal rejection volume − observed rejection volume)/optimal rejection volume x 100. Positive values indicate rats overharvested and stayed in the patch when the volume of the collected reinforcers had dropped below the value required to maximize, whereas negative values indicate the rat left when the volume was greater than the optimal volume. The same approach was used to calculate the percent deviation from optimal time in patch (referred to as percent time deviation). Negative values indicate the observed time in the patch was longer than optimal, and positive values indicate the observed time in the patch was less than optimal.

Statistical Analysis
Statistical analyses used SPSS Statistics software (IBM, Armonk, NY). Descriptive statistics indicated that the distributions of dependent measures were normal (skewness < |1|), so parametric statistics were used throughout.
To examine the effects of COD travel time on patch leaving, we used a mixed factor analysis of variance (ANOVA), with delay as the within-subject factor (0, 6, 12, 18, and 24 s) and sex as the between-subject factor (male and female), in conjunction with post-hoc comparisons using Fisher's least signi cant difference (LSD) test for within-subject comparisons and independent samples t-tests with Bonferroni corrections for between-subject comparisons.
To examine whether analyses applied to traditional DD tasks was possible, hyperbolic equations were tted to each rat's rejection volume (indifference points) based on that described by Mazur 79 , using GraphPad Prism (GraphPad Software Inc., San Diego, CA): where V indicates the rejected volume of the diminishing reinforcer when the rat left the current patch for the alternative patch in µl, A represents the amount of water from the alternative patch (150 µL), and D represents the delay to receiving the 150-µL reinforcer (COD of 0, 2, 4, 8, 16, or 24 s). The bias parameter, b, was calculated such that the product of b and A equaled each animal's indifference point at a 0-s delay 80 . The discount parameter (k) is an index for the rate of discounting or overall sensitivity to delayed reinforcers, such as the rst reinforcer in an alternative patch. In DD tasks, larger values of k indicate steeper discount functions, stronger aversion to delayed reinforcers, more rapid devaluation of reinforcer value by delay, and thus greater impulsive choice. Here, k indicates higher relative levels of overharvesting and a preference for the smaller, sooner rewards available in the current patch over traveling to an alternative patch. The normalized area under the curve (AUC) of the discount function was calculated, which summarizes the in uence of delay length on the choice to remain at a patch location. The AUC measure provides a simple measure of overharvesting/discounting that is not tied to a particular discount function 81 . Smaller AUC values indicate higher levels of overharvesting. The k, b, and AUC values were analyzed using independent samples t-tests, with sex as the between-subject factor.
Composite scores for all variables were calculated by the sum of the variable across all delays. These composite scores were used to calculate Pearson's correlation coe cients. To assess the difference in the strength of their association with DD and reward maximization, correlation coe cients were compared using Fisher r-to-z transformations 29,30 , which is recommended for comparing correlation coe cients from the same sample with one variable in common 82 . For all statistical tests, a p < .05 was used as the alpha criterion.

Declarations Con ict of Interest
The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.

Author Contributions
The study was conceived by A.P., P.M., J.B.R., and K.I. The experimental procedure and general study design was created by J.B.R. The experimental protocols were implemented and conducted by K.I.,   Schematic Illustration of the Sequential Patch Depletion Procedure. (a) Rats are offered a sequential choice between two "patches" (snout poke receptacles). "Stay" choices (snout poke response to the same receptacle) resulted in presentation of decreasing volumes of water (r = reward; μl = microliters of water).
"Leave" choices (snout poke response to the alternate receptacle) were followed by a changeover delay (travel cost) and presentation of the initial larger volume of water (150 μL). Following a "leave" choice, the abandoned patch is replenished to the original reward volume (150 μL). See text for full explanation.   Patch Utilization. These data show the performance of heterogeneous stock rats on the sequential patch depletion procedure across various delays (0, 6, 12, 18, & 24 s). Aqua lines represent male HS rats and blue lines represent female HS rats. (a) Rate of water reinforcement across all delays. Males had signi cantly higher rate of reinforcement relative to females at all delays tested. (b) Number of patch changes in the last two sessions. Females switched patches signi cantly fewer times (made fewer "leave" choices) than males for all delays. (c) Mean time spent in the patch (snout poke receptacle) before leaving for the alternate patch. Females stayed in the patch signi cantly longer at 6-and 12-s delays. Data are expressed as the average ± standard error; * p < 0.05, ** p < 0.01, *** p < 0.001 between males (n = 896) and females (n = 898).

Figure 5
Reward Maximization (a) Percent deviation from optimal time in patch. Observed time in patch was compared to the optimal stay time for reward maximization. Negative values indicate staying longer than optimal, whereas positive values indicate leaving before optimal. Male heterogenous stock rats showed signi cantly greater reward optimization at delays of 0-6 s, whereas female rats showed signi cantly greater reward maximization at delays of 12-18 s. (b) Percent deviation from optimal rejection volume.
Observed rejection volume (indifference point) was compared to the optimal rejection volume for reward maximization. Positive values indicate a rejection volume less than optimal, and negative values indicate a rejection volume greater than optimal. Males showed signi cantly greater reward optimization at delays of 6 and 12 s, whereas females showed signi cantly greater reward maximization at delay of 24 s. Data are expressed as the average ± standard error; * p < 0.05, ** p < 0.01, *** p < 0.001 between males (n = 896) and females (n = 898).